Skip to content

Conversation

@loicdiridollou
Copy link
Member

Resolution for:

  • read_orc (although there is no docs or tests in pandas repo to properly test that)
  • Timestamp - datetime64 (There is the typing to the rsub but mypy sees the wrong type)
  • tz_localize returning NaT (edge case of changing time with DST)

def read_orc(
path: FilePath | ReadBuffer[bytes],
columns: list[HashableT] | None = None,
dtype_backend: DtypeBackend | _NoDefaultDoNotUse = "numpy_nullable",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In 3.0 the default value will be switched from "numpy_nullable" to no_default. I think this can be confusing to users using the stable releases 2.x. What should we do @Dr-Irv ? Anyway, we can address this in a separate PR.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use the 2.3 default for now, and create an issue that we need to fix it when 3.0 is released.

I still think there are things in the release notes for 2.0, 2.1, 2.2 and 2.3 that we never put in the stubs. I created a 2.0 tracker, and I think many of those are still open. Never had the time to create trackers for 2.1, 2.2 and 2.3. And when 3.0 is released, we should create a 3.0 tracker. So we could wait until then instead of creating the issue now

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment on lines +103 to +104
# TODO: pandas-dev/pandas-stubs#1432 mypy sees datetime.timedelta but pyright is correct
# check(assert_type(ts_np - ts, pd.Timedelta), pd.Timedelta)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see our __rsub__ is correct but mypy takes ts_np.__sub__ as a higher priority. This has happened a lot in Series and Index arithmetic. I think neither mypy nor pyright is correct, we should see Any here because of the discrepency. Do you have any suggestion to improve here, @Dr-Irv ?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a bit of a mess, because I don't think that numpy can know whether it supports subtraction or not with a datetime:

>>> import numpy as np
>>> import datetime as dt
>>> ts_np = np.datetime64("2021-01-01")
>>> ts_dt = dt.datetime(2020,12,1)
>>> ts_np - ts_dt
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: unsupported operand type(s) for -: 'datetime.date' and 'datetime.datetime'
>>> ts_dt = dt.datetime(2020,12,1,3,24)
>>> ts_np - ts_dt
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: unsupported operand type(s) for -: 'datetime.date' and 'datetime.datetime'
>>> ts_np = np.datetime64("2021-01-01T03:24")
>>>
>>> ts_np - ts_dt
datetime.timedelta(days=31)
>>> ts_dt = dt.datetime(2021,12,1)
>>> ts_np - ts_dt
datetime.timedelta(days=-334, seconds=12240)

So if the numpy datetime64 does not specify hour/minute, then you can't do subtraction. But if it does have the hour/minute, you can. And their typing can't see the difference, which makes sense.

I suggest doing tests with a numpy type that has no hour/minute resolution (as in this PR) AND with hour/minute resolution.

Without the resolution, it will fail at runtime, but we can't detect it, so put it in if TYPE_CHECKING_INVALID_USAGE with the ignores that detect it (in this case, need a pyright ignore, but not a mypy one)

With the resolution, it will work at runtime, but we can't get the right type with either type checker (pyright will reject, mypy will fail in assert type), so put in appropriate ignores.

In both cases, include comments as to what is going on.

Note - I think pyright is wrong here in not having numpy type declarations figure out that np.datetime64.__sub__(pd.Timestamp) is valid from a type perspective, because pd.Timestamp is a subclass of datetime.datetime But the __sub__() in numpy will take precedence here because it has to allow __sub__(datetime.datetime)

Created a pyright issue: microsoft/pyright#11135

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Issues from TODOs in timestamps.pyi, timedeltas.pyi and orc.pyi

3 participants